Section: Research Program
Filtering real-time Web streams
The Web is rapidly transformed into a real-time information system forcing us to revise both how to effectively assess relevance of information for a user and how to efficiently implement information retrieval or dissemination functionality. To capture various contextual aspects of user needs and information shared in the Real-time Web, besides content and social relevance, we also have to consider implicit (e.g. pageviews) or explicit user feedback (e.g., like, retweet or reply events). To accommodate high arrival rates of information items (e.g., 100 millions of tweets per day) and user events (e.g., billions of pageviews per day) we are exploring a publish/subscribe paradigm in which we index queries and update on the fly their results each time a new item and relevant events arrive. In this respect, we need to process continuous top-k text queries combining query-dependent (as text similarity) and query-independent (as social relevance or user attention) scores with time decay functions.